3,746 research outputs found

    Distributed-Memory Breadth-First Search on Massive Graphs

    Full text link
    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

    Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

    Get PDF
    Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

    JETSPIN: a specific-purpose open-source software for simulations of nanofiber electrospinning

    Get PDF
    We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.Comment: 22 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:1507.0701

    A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads MPI Processes and the Grid

    Full text link
    The work described in this paper tackles the problem of data mining and classification of large amounts of data using the K nearest neighbours classifier (KNN) [1]. The large computing demand of this process is solved with a parallel computing implementation specially designed to work in Grid environments of multiprocessor computer farms. The different parallel computing approaches (intra-node, inter-node and inter-organisations) are not sufficient by themselves to face the computing demand of such a big problem. Instead of using parallel techniques separately, we propose to combine the three of them considering the parallelism grain of the different parts of the problem. The main purpose is to complete a 1 month-CPU job in a few hours. The technologies that are being used are the EGEE Grid Computing Infrastructure running the Large Hadron Collider Computing Grid (LCG 2.6) middleware [3], MPI [4] [5] and POSIX [6] threads. Finally, we compare the results obtained with the most popular and used tools to understand the importance of this strategy.Aparicio Pla, G.; Blanquer Espert, I.; Hernández García, V. (2007). A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads MPI Processes and the Grid. En High Performance Computing for Computational Science - VECPAR 2006. Springer Verlag (Germany). 225-235. doi:10.1007/978-3-540-71351-7_18S225235Cover, T.M., Hart, P.E.: Nearest neighbour pattern recognition. IEEE Trans. on Information Theory 13(1), 2127 (1967)Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications 15(3) (2001), http://www.globus.org/research/papers/anatomy.pdfLCG: World Wide Web Computing Grid. Distributed Production Environment of Physics Data Processing. http://lcg.web.cern.ch/LCGMessage Passing Interface Forum: MPI: A message-passing interface standard (2003), http://www.mpi-forum.org/Gropp, W., et al.: MPI: The Complete Reference. MIT Press, Cambridge (1998)Drepper, U., Molnar, I.: The Native POSIX Thread Library for Linux (2003), http://people.redhat.com/drepper/nptl-design.pdfFrank, E., Hall, M., L.T.: Weka 3: Data Mining Software in Java (2005), http://www.cs.waikato.ac.nz/ml/wek

    Analysis of Oct4-dependent transcriptional networks regulating self-renewal and pluripotency in human embryonic stem cells

    Get PDF
    The POU domain transcription factor OCT4 is a key regulator of pluripotency in the early mammalian embryo and is highly expressed in the inner cell mass of the blastocyst. Consistent with its essential role in maintaining pluripotency, Oct4 expression is rapidly downregulated during formation of the trophoblast lineage. To enhance our understanding of the molecular basis of this differentiation event in humans, we used a functional genomics approach involving RNA interference-mediated suppression of OCT4 function in a human ESC line and analysis of the resulting transcriptional profiles to identify OCT4-dependent genes in human cells. We detected altered expression of >1,000 genes, including targets regulated directly by OCT4 either positively (NANOG, SOX2, REX1, LEFTB, LEFTA/EBAF DPPA4, THY1, and TDGF1) or negatively (CDX2, EOMES, BMP4, TBX18, Brachyury [T], DKK1, HLX1, GATA6, ID2, and DLX5), as well as targets for the OCT4-associated stem cell regulators SOX2 and NANOG. Our data set includes regulators of ACTIVIN, BMP, fibroblast growth factor, and WNT signaling. These pathways are implicated in regulating human ESC differentiation and therefore further validate the results of our analysis. In addition, we identified a number of differentially expressed genes that are involved in epigenetics, chromatin remodeling, apoptosis, and metabolism that may point to underlying molecular mechanisms that regulate pluripotency and trophoblast differentiation in humans. Significant concordance between this data set and previous comparisons between inner cell mass and trophectoderm in human embryos indicates that the study of human ESC differentiation in vitro represents a useful model of early embryonic differentiation in humans
    corecore