439 research outputs found
Parallel Access of Out-Of-Core Dense Extendible Arrays
Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements
Recommended from our members
Optimal Chunking of Large Multidimensional Arrays for Data Warehousing
Very large multidimensional arrays are commonly used in data intensive scientific computations as well as on-line analytical processingapplications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is"what shapes of array chunks give the minimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0percent of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of Sarawagi and Stonebraker can deviate considerably from the true result with increasing number of dimensions and also may lead to suboptimal chunk shapes
High Performance P3M N-body code: CUBEP3M
This paper presents CUBEP3M, a publicly-available high performance
cosmological N-body code and describes many utilities and extensions that have
been added to the standard package. These include a memory-light runtime SO
halo finder, a non-Gaussian initial conditions generator, and a system of
unique particle identification. CUBEP3M is fast, its accuracy is tuneable to
optimize speed or memory, and has been run on more than 27,000 cores, achieving
within a factor of two of ideal weak scaling even at this problem size. The
code can be run in an extra-lean mode where the peak memory imprint for large
runs is as low as 37 bytes per particles, which is almost two times leaner than
other widely used N-body codes. However, load imbalances can increase this
requirement by a factor of two, such that fast configurations with all the
utilities enabled and load imbalances factored in require between 70 and 120
bytes per particles. CUBEP3M is well designed to study large scales
cosmological systems, where imbalances are not too large and adaptive
time-stepping not essential. It has already been used for a broad number of
science applications that require either large samples of non-linear
realizations or very large dark matter N-body simulations, including
cosmological reionization, halo formation, baryonic acoustic oscillations, weak
lensing or non-Gaussian statistics. We discuss the structure, the accuracy,
known systematic effects and the scaling performance of the code and its
utilities, when applicable.Comment: 20 pages, 17 figures, added halo profiles, updated to match MNRAS
accepted versio
A study of systems implementation languages for the POCCNET system
The results are presented of a study of systems implementation languages for the Payload Operations Control Center Network (POCCNET). Criteria are developed for evaluating the languages, and fifteen existing languages are evaluated on the basis of these criteria
- …