2,319 research outputs found
Analyze Large Multidimensional Datasets Using Algebraic Topology
This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper also investigate the possibility of Hadoop integration and the challenges that come with the framework
A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration
The simplex algorithm has been successfully used for many years in solving
linear programming (LP) problems. Due to the intensive computations required
(especially for the solution of large LP problems), parallel approaches have
also extensively been studied. The computational power provided by the modern
GPUs as well as the rapid development of multicore CPU systems have led OpenMP
and CUDA programming models to the top preferences during the last years.
However, the desired efficient collaboration between CPU and GPU through the
combined use of the above programming models is still considered a hard
research problem. In the above context, we demonstrate here an excessively
efficient implementation of standard simplex, targeting to the best possible
exploitation of the concurrent use of all the computing resources, on a
multicore platform with multiple CUDA-enabled GPUs. More concretely, we present
a novel hybrid collaboration scheme which is based on the concurrent execution
of suitably spread CPU-assigned (via multithreading) and GPU-offloaded
computations. The experimental results extracted through the cooperative use of
OpenMP and CUDA over a notably powerful modern hybrid platform (consisting of
32 cores and two high-spec GPUs, Titan Rtx and Rtx 2080Ti) highlight that the
performance of the presented here hybrid GPU/CPU collaboration scheme is
clearly superior to the GPU-only implementation under almost all conditions.
The corresponding measurements validate the value of using all resources
concurrently, even in the case of a multi-GPU configuration platform.
Furthermore, the given implementations are completely comparable (and slightly
superior in most cases) to other related attempts in the bibliography, and
clearly superior to the native CPU-implementation with 32 cores.Comment: 12 page
Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis. (VII) HFODD (v2.49t): a new version of the program
We describe the new version (v2.49t) of the code HFODD which solves the
nuclear Skyrme Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB)
problem by using the Cartesian deformed harmonic-oscillator basis. In the new
version, we have implemented the following physics features: (i) the isospin
mixing and projection, (ii) the finite temperature formalism for the HFB and
HF+BCS methods, (iii) the Lipkin translational energy correction method, (iv)
the calculation of the shell correction. A number of specific numerical methods
have also been implemented in order to deal with large-scale multi-constraint
calculations and hardware limitations: (i) the two-basis method for the HFB
method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint
calculations, (iii) the linear constraint method based on the approximation of
the RPA matrix for multi-constraint calculations, (iv) an interface with the
axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or
HFB matrix elements instead of the HF fields. Special care has been paid to
using the code on massively parallel leadership class computers. For this
purpose, the following features are now available with this version: (i) the
Message Passing Interface (MPI) framework, (ii) scalable input data routines,
(iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the
HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally,
several little significant errors of the previous published version were
corrected.Comment: Accepted for publication to Computer Physics Communications. Program
files re-submitted to Comp. Phys. Comm. Program Library after correction of
several minor bug
Scalable Empirical Dynamic Modeling With Parallel Computing and Approximate k-NN Search
Empirical Dynamic Modeling (EDM) is a mathematical framework for modeling and predicting non-linear time series data. Although EDM is increasingly adopted in various research fields, its application to large-scale data has been limited due to its high computational cost. This article presents kEDM, a high-performance implementation of EDM for analyzing large-scale time series datasets. kEDM adopts the Kokkos performance-portable programming model to efficiently run on both CPU and GPU while sharing a single code base. We also conduct hardware-specific optimization of performance-critical kernels. kEDM achieved up to 6.58× speedup in pairwise causal inference of real-world biology datasets compared to an existing EDM implementation. Furthermore, we integrate multiple approximate k-NN search algorithms into EDM to enable the analysis of extremely large datasets that were intractable with conventional EDM based on exhaustive k-NN search. EDM-based time series forecast enhanced with approximate k-NN search demonstrated up to 790× speedup compared to conventional Simplex projection with less than 1% increase in MAPE.journal articl
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
- …