63 research outputs found
Optimization of SAMtools sorting using OpenMP tasks
SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file
A Delta Once More: Restoring Riparian and Wetland Habitat in the Colorado River Delta
Outlines the delta's history and current political context, documents recent findings about the delta's partial recovery, and makes recommendations for maintaining existing flows to further benefit and sustain the remnant wetland ecosystems
Recovery from Fail-Stop Failures in Parallel Fortran Applications
The Fortran 2018 standard defines syntax and semantics to allow a parallel application to recover from failed images (processes) during execution. This poster presents work to extend the GFortran compiler front end and OpenCoarrays library to support fault tolerant teams of images, enabling use of collective routines after an image failure
Solving hadron structures using the basis light-front quantization approach on quantum computers
Quantum computing has demonstrated the potential to revolutionize our
understanding of nuclear, atomic, and molecular structure by obtaining
forefront solutions in non-relativistic quantum many-body theory. In this work,
we show that quantum computing can be used to solve for the structure of
hadrons, governed by strongly-interacting relativistic quantum field theory.
Following our previous work on light unflavored mesons as a relativistic
bound-state problem within the nonperturbative Hamiltonian formalism, we
present the numerical calculations on simulated quantum devices using the basis
light-front quantization (BLFQ) approach. We implement and compare the
variational quantum eigensolver (VQE) and the subspace-search variational
quantum eigensolver (SSVQE) to find the low-lying mass spectrum of the light
meson system and its corresponding light-front wave functions as quantum states
from ideal simulators, noisy simulators, and IBM quantum computers. Based on
obtained quantum states, we evaluate the meson decay constants and parton
distribution functions directly on the quantum circuits. Our calculations on
the quantum computers and simulators are in reasonable agreement with accurate
numerical solutions solved on classical computers when noises are moderately
small, and our overall results are comparable with the available experimental
data.Comment: 20 pages, 8 figure
Comparative study of variations in quantum approximate optimization algorithms for the Traveling Salesman Problem
The Traveling Salesman Problem (TSP) is one of the most often-used NP-Hard
problems in computer science to study the effectiveness of computing models and
hardware platforms. In this regard, it is also heavily used as a vehicle to
study the feasibility of the quantum computing paradigm for this class of
problems. In this paper, we tackle the TSP using the quantum approximate
optimization algorithm (QAOA) approach by formulating it as an optimization
problem. By adopting an improved qubit encoding strategy and a layerwise
learning optimization protocol, we present numerical results obtained from the
gate-based digital quantum simulator, specifically targeting TSP instances with
3, 4, and 5 cities. We focus on the evaluations of three distinctive QAOA mixer
designs, considering their performances in terms of numerical accuracy and
optimization cost. Notably, we find a well-balanced QAOA mixer design exhibits
more promising potential for gate-based simulators and realistic quantum
devices in the long run, an observation further supported by our noise model
simulations. Furthermore, we investigate the sensitivity of the simulations to
the TSP graph. Overall, our simulation results show the digital quantum
simulation of problem-inspired ansatz is a successful candidate for finding
optimal TSP solutions.Comment: 18 pages, 6 figures, 3 table
High-performance epistasis detection in quantitative trait GWAS
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated
Paranormal operator on a Hilbert space
In this thesis an extensive study is made of the set P of all paranormal operators in B(H), the set of all bounded endomorphisms on the complex Hilbert space H. T ϵ B(H) is paranormal if for each z contained in the resolvent set of T, d(z, σ(T))//(T-zI)-1 = 1 where d(z, σ(T)) is the distance from z to σ(T), the spectrum of T. P contains the set N of normal operators and P contains the set of hyponormal operators. However, P is contained in L, the set of all T ϵ B(H) such that the convex hull of the spectrum of T is equal to the closure of the numerical range of T. Thus, N≤P≤L.
If the uniform operator (norm) topology is placed on B(H), then the relative topological properties of N, P, L can be discussed. In Section IV, it is shown that: 1) N P and L are arc-wise connected and closed, 2) N, P, and L are nowhere dense subsets of B(H) when dim H ≥ 2,
3) N = P when dimH ˂ ∞ ,
4) N is a nowhere dense subset of P when dimH ˂ ∞ ,
5) P is not a nowhere dense subset of L when dimH ˂ ∞ , and
6) it is not known if P is a nowhere dense subset of L when dimH ˂ ∞.
The spectral properties of paranormal operators are of current interest in the literature. Putnam [22, 23] has shown that certain points on the boundary of the spectrum of a paranormal operator are either normal eigenvalues or normal approximate eigenvalues. Stampfli [26] has shown that a hyponormal operator with countable spectrum is normal. However, in Theorem 3.3, it is shown that a paranormal operator T with countable spectrum can be written as the direct sum, N ⊕ A, of a normal operator N with σ(N) = σ(T) and of an operator A with σ(A) a subset of the derived set of σ(T). It is then shown that A need not be normal. If we restrict the countable spectrum of T ϵ P to lie on a C2-smooth rectifiable Jordan curve Go, then T must be normal [see Theorem 3.5 and its Corollary]. If T is a scalar paranormal operator with countable spectrum, then in order to conclude that T is normal the condition of σ(T) ≤ Go can be relaxed [see Theorem 3.6]. In Theorem 3.7 it is then shown that the above result is not true when T is not assumed to be scalar. It was then conjectured that if T ϵ P with σ(T) ≤ Go, then T is normal. The proof of Theorem 3.5 relies heavily on the assumption that T has countable spectrum and cannot be generalized. However, the corollary to Theorem 3.9 states that if T ϵ P with σ(T) ≤ Go, then T has a non-trivial lattice of invariant subspaces. After the completion of most of the work on this thesis, Stampfli [30, 31] published a proof that a paranormal operator T with σ(T) ≤ Go is normal. His proof uses some rather deep results concerning numerical ranges whereas the proof of Theorem 3.5 uses relatively elementary methods. </p
Comparing The Performance Of Mpi On The Cray Research T3e And Ibm Sp-2 1
This paper reports the performance of the Cray Research T3E and IBM SP-2 on a collection of communication tests that use MPI for the message passing. These tests have been designed to evaluate the performance of communication patterns that we feel are likely to occur in scientific programs. Communication tests were performed for messages of sizes 8 bytes, 1 KB, 100 KB, and 10 MB with 2, 4, 8, 16, 32 and 64 processors. Both machines provided a very high level of concurrency for the nearest neighbor communication tests and moderate concurrency on the broadcast operations. On the tests used, the T3E significantly outperformed the SP-2 with most performance tests being at least three times faster than the SP-2
- …