Search CORE

356 research outputs found

Scalability analysis of declustering methods for multidimensional range queries

Author: Bongki Moon
Joel H. Saltz
Publication venue
Publication date: 01/01/1998
Field of study

Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

CiteSeerX

Scalability Analysis of Declustering Methods for Cartesian Product Files

Author: Moon Bongki
Saltz Joel
Publication venue
Publication date: 15/10/1998
Field of study

Efficient storage and retrieval of multi-attribute datasets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files over multiple disks to obtain high performance for disk accesses. Though the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper we derive formulas describing the scalability of two popular declustering methods Disk Modulo and Fieldwise Xor for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods and are corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions. (Also cross-referenced as UMIACS-TR-96-5

Digital Repository at the University of Maryland

Study of Scalable Declustering Algorithms for Parallel Grid Files

Author: Acharya Anurag
Moon Bongki
Saltz Joel
Publication venue
Publication date: 15/10/1998
Field of study

Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4

Digital Repository at the University of Maryland

Partitioning similarity graphs: A framework for declustering problems

Author: Barnes
Camp
Chang
Chen
Cheng
DeWitt
Du
Du
Duen-Ren Liu
Faloutsos
Faloutsos
Faloutsos
Fang
Fiduccia
Garey
Ghandeharizadeh
Ghandeharizadeh
Ghandeharizadeh
Himmatsingka
Jagadish
Kamel
Karypis
Kernighan
Kim
Kim
Kouramajian
Krishnamurthy
Li
Liu
Nievergelt
Rotem
Seeger
Shashi Shekhar
Shekhar
Weikum
Yeh
Zhou
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Analysis and Comparison of Replicated Declustering Schemes

Author: Ali Saman Tosun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Supported Programming for Beginning Developers

Author: Gilbert Andrew
Publication venue: DigitalCommons@CalPoly
Publication date: 01/03/2019
Field of study

Testing code is important, but writing test cases can be time consuming, particularly for beginning programmers who are already struggling to write an implementation. We present TestBuilder, a system for test case generation which uses an SMT solver to generate inputs to reach specified lines in a function, and asks the user what the expected outputs would be for those inputs. The resulting test cases check the correctness of the output, rather than merely ensuring the code does not crash. Further, by querying the user for expectations, TestBuilder encourages the programmer to think about what their code ought to do, rather than assuming that whatever it does is correct. We demonstrate, using mutation testing of student projects, that tests generated by TestBuilder perform better than merely compiling the code using Python’s built-in compile function, although they underperform the tests students write when required to achieve 100% test coverage

DigitalCommons@CalPoly

In Situ Visualization of Performance Data in Parallel CFD Applications

Author: Falcao do Couto Alves Rigel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2023
Field of study

This thesis summarizes the work of the author on visualization of performance data in parallel Computational Fluid Dynamics (CFD) simulations. Current performance analysis tools are unable to show their data on top of complex simulation geometries (e.g. an aircraft engine). But in CFD simulations, performance is expected to be affected by the computations being carried out, which in turn are tightly related to the underlying computational grid. Therefore it is imperative that performance data is visualized on top of the same computational geometry which they originate from. However, performance tools have no native knowledge of the underlying mesh of the simulation. This scientific gap can be filled by merging the branches of HPC performance analysis and in situ visualization of CFD simulations data, which shall be done by integrating existing, well established state-of-the-art tools from each field. In this threshold, an extension for the open-source performance tool Score-P was designed and developed, which intercepts an arbitrary number of manually selected code regions (mostly functions) and send their respective measurements – amount of executions and cumulative time spent – to the visualization software ParaView – through its in situ library, Catalyst –, as if they were any other flow-related variable. Subsequently the tool was extended with the capacity to also show communication data (messages sent between MPI ranks) on top of the CFD mesh. Testing and evaluation are done with two industry-grade codes: Rolls-Royce’s CFD code, Hydra, and Onera, DLR and Airbus’ CFD code, CODA. On the other hand, it has been also noticed that the current performance tools have limited capacity of displaying their data on top of three-dimensional, framed (i.e. time-stepped) representations of the cluster’s topology. Parallel to that, in order for the approach not to be limited to codes which already have the in situ adapter, it was extended to take the performance data and display it – also in codes without in situ – on a three-dimensional, framed representation of the hardware resources being used by the simulation. Testing is done with the Multi-Grid and Block Tri-diagonal NAS Parallel Benchmarks (NPB), as well as with Hydra and CODA again. The benchmarks are used to explain how the new visualizations work, while real performance analyses are done with the industry-grade CFD codes. The proposed solution is able to provide concrete performance insights, which would not have been reached with the current performance tools and which motivated beneficial changes in the respective source code in real life. Finally, its overhead is discussed and proven to be suitable for usage with CFD codes. The dissertation provides a valuable addition to the state of the art of highly parallel CFD performance analysis and serves as basis for further suggested research directions

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

P-Pascal : a data-oriented persistent programming language

Author: Berman Sonia
Publication venue: Department of Computer Science
Publication date: 01/01/1991
Field of study

Bibliography: pages 187-199.Persistence is measured by the length of time an object is retained and is usable in a system. Persistent languages extend general purpose languages by providing the full range of persistence for data of any type. Moreover, data which remains on disk after program termination, is manipulated in the same way as transient data. As these languages are based on general purpose programming languages, they tend to be program-centred rather than data-centred. This thesis investigates the inclusion of data-oriented features in a persistent programming language. P-Pascal, a Persistent Pascal, has been designed and implemented to develop techniques for data clustering, metadata maintenance, security enforcement and bulk data management. It introduces type completeness to Pascal and in particular shows how a type-complete set constructor can be provided. This type is shown to be a practical and versatile mechanism for handling bulk data collections in a persistent environment. Relational algebra operators are provided and the automatic optimisation of set expressions is performed by the compiler and the runtime system. The P-Pascal Abstract Machine incorporates two complementary data placement strategies, automatic updating of type information, and metadata query facilities. The protection of data types, primary (named) objects and their individual components is supported. The challenges and opportunities presented by the persistent store organisation are discussed, and techniques for efficiently exploiting these properties are proposed. We also describe the effects on a data-oriented system of treating persistent and transient data alike, so that they cannot be distinguished statically. We conclude that object clustering, metadata maintenance and security enforcement can and should be incorporated in persistent programming languages. The provision of a built-in, type-complete bulk data constructor and its non-procedural operators is demonstrated. We argue that this approach is preferable to engineering such objects on top of a language, because of greater ease of use and considerable opportunity for automatic optimisation. The existence of such a type does not preclude programmers from constructing their own bulk objects using other types - this is but one advantage of a persistent language over a database system

Cape Town University OpenUCT

Numerical aerodynamic simulation facility feasibility study

Author
Publication venue
Publication date
Field of study

There were three major issues examined in the feasibility study. First, the ability of the proposed system architecture to support the anticipated workload was evaluated. Second, the throughput of the computational engine (the flow model processor) was studied using real application programs. Third, the availability reliability, and maintainability of the system were modeled. The evaluations were based on the baseline systems. The results show that the implementation of the Numerical Aerodynamic Simulation Facility, in the form considered, would indeed be a feasible project with an acceptable level of risk. The technology required (both hardware and software) either already exists or, in the case of a few parts, is expected to be announced this year. Facets of the work described include the hardware configuration, software, user language, and fault tolerance

NASA Technical Reports Server

GIZMO: A New Class of Accurate, Mesh-Free Hydrodynamic Simulation Methods

Author: Hopkins Philip F.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/06/2015
Field of study

We present two new Lagrangian methods for hydrodynamics, in a systematic comparison with moving-mesh, SPH, and stationary (non-moving) grid methods. The new methods are designed to simultaneously capture advantages of both smoothed-particle hydrodynamics (SPH) and grid-based/adaptive mesh refinement (AMR) schemes. They are based on a kernel discretization of the volume coupled to a high-order matrix gradient estimator and a Riemann solver acting over the volume 'overlap.' We implement and test a parallel, second-order version of the method with self-gravity & cosmological integration, in the code GIZMO: this maintains exact mass, energy and momentum conservation; exhibits superior angular momentum conservation compared to all other methods we study; does not require 'artificial diffusion' terms; and allows the fluid elements to move with the flow so resolution is automatically adaptive. We consider a large suite of test problems, and find that on all problems the new methods appear competitive with moving-mesh schemes, with some advantages (particularly in angular momentum conservation), at the cost of enhanced noise. The new methods have many advantages vs. SPH: proper convergence, good capturing of fluid-mixing instabilities, dramatically reduced 'particle noise' & numerical viscosity, more accurate sub-sonic flow evolution, & sharp shock-capturing. Advantages vs. non-moving meshes include: automatic adaptivity, dramatically reduced advection errors & numerical overmixing, velocity-independent errors, accurate coupling to gravity, good angular momentum conservation and elimination of 'grid alignment' effects. We can, for example, follow hundreds of orbits of gaseous disks, while AMR and SPH methods break down in a few orbits. However, fixed meshes minimize 'grid noise.' These differences are important for a range of astrophysical problems.Comment: 57 pages, 33 figures. MNRAS. A public version of the GIZMO code, user's guide, test problem setups, and movies are available at http://www.tapir.caltech.edu/~phopkins/Site/GIZMO.htm

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors