8,892 research outputs found

    Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part II: application to partial differential equations

    Full text link
    A template-based generic programming approach was presented in a previous paper that separates the development effort of programming a physical model from that of computing additional quantities, such as derivatives, needed for embedded analysis algorithms. In this paper, we describe the implementation details for using the template-based generic programming approach for simulation and analysis of partial differential equations (PDEs). We detail several of the hurdles that we have encountered, and some of the software infrastructure developed to overcome them. We end with a demonstration where we present shape optimization and uncertainty quantification results for a 3D PDE application

    Matching non-uniformity for program optimizations on heterogeneous many-core systems

    Get PDF
    As computing enters an era of heterogeneity and massive parallelism, it exhibits a distinct feature: the deepening non-uniform relations among the computing elements in both hardware and software. Besides traditional non-uniform memory accesses, much deeper non-uniformity shows in a processor, runtime, and application, exemplified by the asymmetric cache sharing, memory coalescing, and thread divergences on multicore and many-core processors. Being oblivious to the non-uniformity, current applications fail to tap into the full potential of modern computing devices.;My research presents a systematic exploration into the emerging property. It examines the existence of such a property in modern computing, its influence on computing efficiency, and the challenges for establishing a non-uniformity--aware paradigm. I propose several techniques to translate the property into efficiency, including data reorganization to eliminate non-coalesced accesses, asynchronous data transformations for locality enhancement and a controllable scheduling for exploiting non-uniformity among thread blocks. The experiments show much promise of these techniques in maximizing computing throughput, especially for programs with complex data access patterns

    Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

    Full text link
    We study the problem of approximating the 33-profile of a large graph. 33-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 33-profile sparsifiers: sparse graphs that can be used to approximate the full 33-profile counts for a given large graph. Further, we study the problem of estimating local and ego 33-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 22-hop information without maintaining an explicit 22-hop neighborhood list at each vertex. This enables the computation of all the local 33-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640640 cores on Amazon EC2. We find that our algorithm can estimate the 33-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 33-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target

    Get PDF
    We provide a detailed estimate for the logical resource requirements of the quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)] including the recently described elaborations [Phys. Rev. Lett. 110, 250504 (2013)]. Our resource estimates are based on the standard quantum-circuit model of quantum computation; they comprise circuit width, circuit depth, the number of qubits and ancilla qubits employed, and the overall number of elementary quantum gate operations as well as more specific gate counts for each elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}. To perform these estimates, we used an approach that combines manual analysis with automated estimates generated via the Quipper quantum programming language and compiler. Our estimates pertain to the example problem size N=332,020,680 beyond which, according to a crude big-O complexity comparison, QLSA is expected to run faster than the best known classical linear-system solving algorithm. For this problem size, a desired calculation accuracy 0.01 requires an approximate circuit width 340 and circuit depth of order 102510^{25} if oracle costs are excluded, and a circuit width and depth of order 10810^8 and 102910^{29}, respectively, if oracle costs are included, indicating that the commonly ignored oracle resources are considerable. In addition to providing detailed logical resource estimates, it is also the purpose of this paper to demonstrate explicitly how these impressively large numbers arise with an actual circuit implementation of a quantum algorithm. While our estimates may prove to be conservative as more efficient advanced quantum-computation techniques are developed, they nevertheless provide a valid baseline for research targeting a reduction of the resource requirements, implying that a reduction by many orders of magnitude is necessary for the algorithm to become practical.Comment: 37 pages, 40 figure

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page
    corecore