10,921 research outputs found

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    Dobrushin-Kotecky-Shlosman theorem for polygonal Markov fields in the plane

    Full text link
    We consider the so-called length-interacting Arak-Surgailis polygonal Markov fields with V-shaped nodes - a continuum and isometry invariant process in the plane sharing a number of properties with the two-dimensional Ising model. For these polygonal fields we establish a low-temperature phase separation theorem in the spirit of the Dobrushin-Kotecky-Shlosman theory, with the corresponding Wulff shape deteremined to be a disk due to the rotation invariant nature of the considered model. As an important tool replacing the classical cluster expansion techniques and very well suited for our geometric setting we use a graphical construction built on contour birth and death process, following the ideas of Fernandez, Ferrari and Garcia.Comment: 59 pages, new version revised according to the referee's suggestions and now publishe

    The response of diatom central carbon metabolism to nitrogen starvation is different from that of green algae and higher plants

    Get PDF
    The availability of nitrogen varies greatly in the ocean and limits primary productivity over large areas. Diatoms, a group of phytoplankton that are responsible for about 20% of global carbon fixation, respond rapidly to influxes of nitrate and are highly successful in upwelling regions. Although recent diatom genome projects have highlighted clues to the success of this group, very little is known about their adaptive response to changing environmental conditions. Here, we compare the proteome of the marine diatom Thalassiosira pseudonana (CCMP 1335) at the onset of nitrogen starvation with that of nitrogen-replete cells using two-dimensional gel electrophoresis. In total, 3,310 protein spots were distinguishable, and we identified 42 proteins increasing and 23 decreasing in abundance (greater than 1.5-fold change; P < 0.005). Proteins involved in the metabolism of nitrogen, amino acids, proteins, and carbohydrates, photosynthesis, and chlorophyll biosynthesis were represented. Comparison of our proteomics data with the transcriptome response of this species under similar growth conditions showed good correlation and provided insight into different levels of response. The T. pseudonana response to nitrogen starvation was also compared with that of the higher plant Arabidopsis (Arabidopsis thaliana), the green alga Chlamydomonas reinhardtii, and the cyanobacterium Prochlorococcus marinus. We have found that the response of diatom carbon metabolism to nitrogen starvation is different from that of other photosynthetic eukaryotes and bears closer resemblance to the response of cyanobacteria

    A fast semi-direct least squares algorithm for hierarchically block separable matrices

    Full text link
    We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived from asymptotically smooth radial kernels that are not too oscillatory. The algorithm is based on a recursive skeletonization procedure that exposes this sparsity and solves the dense least squares problem as a larger, equality-constrained, sparse one. It relies on a sparse QR factorization coupled with iterative weighted least squares methods. In essence, our scheme consists of a direct component, comprised of matrix compression and factorization, followed by an iterative component to enforce certain equality constraints. At most two iterations are typically required for problems that are not too ill-conditioned. For an M×NM \times N HBS matrix with M≄NM \geq N having bounded off-diagonal block rank, the algorithm has optimal O(M+N)\mathcal{O} (M + N) complexity. If the rank increases with the spatial dimension as is common for operators that are singular at the origin, then this becomes O(M+N)\mathcal{O} (M + N) in 1D, O(M+N3/2)\mathcal{O} (M + N^{3/2}) in 2D, and O(M+N2)\mathcal{O} (M + N^{2}) in 3D. We illustrate the performance of the method on both over- and underdetermined systems in a variety of settings, with an emphasis on radial basis function approximation and efficient updating and downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App

    Geometry-Oblivious FMM for Compressing Dense SPD Matrices

    Full text link
    We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in Nlog⁥NN \log N or even NN time, where NN is the matrix size. Compression requires Nlog⁥NN \log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1

    Teaching programming with computational and informational thinking

    Get PDF
    Computers are the dominant technology of the early 21st century: pretty well all aspects of economic, social and personal life are now unthinkable without them. In turn, computer hardware is controlled by software, that is, codes written in programming languages. Programming, the construction of software, is thus a fundamental activity, in which millions of people are engaged worldwide, and the teaching of programming is long established in international secondary and higher education. Yet, going on 70 years after the first computers were built, there is no well-established pedagogy for teaching programming. There has certainly been no shortage of approaches. However, these have often been driven by fashion, an enthusiastic amateurism or a wish to follow best industrial practice, which, while appropriate for mature professionals, is poorly suited to novice programmers. Much of the difficulty lies in the very close relationship between problem solving and programming. Once a problem is well characterised it is relatively straightforward to realise a solution in software. However, teaching problem solving is, if anything, less well understood than teaching programming. Problem solving seems to be a creative, holistic, dialectical, multi-dimensional, iterative process. While there are well established techniques for analysing problems, arbitrary problems cannot be solved by rote, by mechanically applying techniques in some prescribed linear order. Furthermore, historically, approaches to teaching programming have failed to account for this complexity in problem solving, focusing strongly on programming itself and, if at all, only partially and superficially exploring problem solving. Recently, an integrated approach to problem solving and programming called Computational Thinking (CT) (Wing, 2006) has gained considerable currency. CT has the enormous advantage over prior approaches of strongly emphasising problem solving and of making explicit core techniques. Nonetheless, there is still a tendency to view CT as prescriptive rather than creative, engendering scholastic arguments about the nature and status of CT techniques. Programming at heart is concerned with processing information but many accounts of CT emphasise processing over information rather than seeing then as intimately related. In this paper, while acknowledging and building on the strengths of CT, I argue that understanding the form and structure of information should be primary in any pedagogy of programming

    JIT costing adaptive skeletons for performance portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregular parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregular parallel programs. The approach combines JIT compiler technology with dynamic scheduling and dynamic transformation of declarative parallelism. We specify families of algorithmic skeletons plus equations for rewriting skeleton expressions. We present the design of a framework that unfolds skeletons into task graphs, dynamically schedules tasks, and dynamically rewrites skeletons, guided by a lightweight JIT trace-based cost model, to adapt the number and granularity of tasks for the architecture. We outline the system architecture and prototype implementation in Racket/Pycket. As the current prototype does not yet automatically perform dynamic rewriting we present results based on manual offline rewriting, demonstrating that (i) the system scales to hundreds of cores given enough parallelism of suitable granularity, and (ii) the JIT trace cost model predicts granularity accurately enough to guide rewriting towards a good adaptive transformation
    • 

    corecore