8,892 research outputs found
Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part II: application to partial differential equations
A template-based generic programming approach was presented in a previous
paper that separates the development effort of programming a physical model
from that of computing additional quantities, such as derivatives, needed for
embedded analysis algorithms. In this paper, we describe the implementation
details for using the template-based generic programming approach for
simulation and analysis of partial differential equations (PDEs). We detail
several of the hurdles that we have encountered, and some of the software
infrastructure developed to overcome them. We end with a demonstration where we
present shape optimization and uncertainty quantification results for a 3D PDE
application
Matching non-uniformity for program optimizations on heterogeneous many-core systems
As computing enters an era of heterogeneity and massive parallelism, it exhibits a distinct feature: the deepening non-uniform relations among the computing elements in both hardware and software. Besides traditional non-uniform memory accesses, much deeper non-uniformity shows in a processor, runtime, and application, exemplified by the asymmetric cache sharing, memory coalescing, and thread divergences on multicore and many-core processors. Being oblivious to the non-uniformity, current applications fail to tap into the full potential of modern computing devices.;My research presents a systematic exploration into the emerging property. It examines the existence of such a property in modern computing, its influence on computing efficiency, and the challenges for establishing a non-uniformity--aware paradigm. I propose several techniques to translate the property into efficiency, including data reorganization to eliminate non-coalesced accesses, asynchronous data transformations for locality enhancement and a controllable scheduling for exploiting non-uniformity among thread blocks. The experiments show much promise of these techniques in maximizing computing throughput, especially for programs with complex data access patterns
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target
We provide a detailed estimate for the logical resource requirements of the
quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)]
including the recently described elaborations [Phys. Rev. Lett. 110, 250504
(2013)]. Our resource estimates are based on the standard quantum-circuit model
of quantum computation; they comprise circuit width, circuit depth, the number
of qubits and ancilla qubits employed, and the overall number of elementary
quantum gate operations as well as more specific gate counts for each
elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}.
To perform these estimates, we used an approach that combines manual analysis
with automated estimates generated via the Quipper quantum programming language
and compiler. Our estimates pertain to the example problem size N=332,020,680
beyond which, according to a crude big-O complexity comparison, QLSA is
expected to run faster than the best known classical linear-system solving
algorithm. For this problem size, a desired calculation accuracy 0.01 requires
an approximate circuit width 340 and circuit depth of order if oracle
costs are excluded, and a circuit width and depth of order and
, respectively, if oracle costs are included, indicating that the
commonly ignored oracle resources are considerable. In addition to providing
detailed logical resource estimates, it is also the purpose of this paper to
demonstrate explicitly how these impressively large numbers arise with an
actual circuit implementation of a quantum algorithm. While our estimates may
prove to be conservative as more efficient advanced quantum-computation
techniques are developed, they nevertheless provide a valid baseline for
research targeting a reduction of the resource requirements, implying that a
reduction by many orders of magnitude is necessary for the algorithm to become
practical.Comment: 37 pages, 40 figure
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
- …