356 research outputs found

    Scalability analysis of declustering methods for multidimensional range queries

    Get PDF
    Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

    Scalability Analysis of Declustering Methods for Cartesian Product Files

    Get PDF
    Efficient storage and retrieval of multi-attribute datasets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files over multiple disks to obtain high performance for disk accesses. Though the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper we derive formulas describing the scalability of two popular declustering methods Disk Modulo and Fieldwise Xor for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods and are corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions. (Also cross-referenced as UMIACS-TR-96-5

    Study of Scalable Declustering Algorithms for Parallel Grid Files

    Get PDF
    Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4

    Analysis and Comparison of Replicated Declustering Schemes

    Full text link

    Supported Programming for Beginning Developers

    Get PDF
    Testing code is important, but writing test cases can be time consuming, particularly for beginning programmers who are already struggling to write an implementation. We present TestBuilder, a system for test case generation which uses an SMT solver to generate inputs to reach specified lines in a function, and asks the user what the expected outputs would be for those inputs. The resulting test cases check the correctness of the output, rather than merely ensuring the code does not crash. Further, by querying the user for expectations, TestBuilder encourages the programmer to think about what their code ought to do, rather than assuming that whatever it does is correct. We demonstrate, using mutation testing of student projects, that tests generated by TestBuilder perform better than merely compiling the code using Python’s built-in compile function, although they underperform the tests students write when required to achieve 100% test coverage

    In Situ Visualization of Performance Data in Parallel CFD Applications

    Get PDF
    This thesis summarizes the work of the author on visualization of performance data in parallel Computational Fluid Dynamics (CFD) simulations. Current performance analysis tools are unable to show their data on top of complex simulation geometries (e.g. an aircraft engine). But in CFD simulations, performance is expected to be affected by the computations being carried out, which in turn are tightly related to the underlying computational grid. Therefore it is imperative that performance data is visualized on top of the same computational geometry which they originate from. However, performance tools have no native knowledge of the underlying mesh of the simulation. This scientific gap can be filled by merging the branches of HPC performance analysis and in situ visualization of CFD simulations data, which shall be done by integrating existing, well established state-of-the-art tools from each field. In this threshold, an extension for the open-source performance tool Score-P was designed and developed, which intercepts an arbitrary number of manually selected code regions (mostly functions) and send their respective measurements – amount of executions and cumulative time spent – to the visualization software ParaView – through its in situ library, Catalyst –, as if they were any other flow-related variable. Subsequently the tool was extended with the capacity to also show communication data (messages sent between MPI ranks) on top of the CFD mesh. Testing and evaluation are done with two industry-grade codes: Rolls-Royce’s CFD code, Hydra, and Onera, DLR and Airbus’ CFD code, CODA. On the other hand, it has been also noticed that the current performance tools have limited capacity of displaying their data on top of three-dimensional, framed (i.e. time-stepped) representations of the cluster’s topology. Parallel to that, in order for the approach not to be limited to codes which already have the in situ adapter, it was extended to take the performance data and display it – also in codes without in situ – on a three-dimensional, framed representation of the hardware resources being used by the simulation. Testing is done with the Multi-Grid and Block Tri-diagonal NAS Parallel Benchmarks (NPB), as well as with Hydra and CODA again. The benchmarks are used to explain how the new visualizations work, while real performance analyses are done with the industry-grade CFD codes. The proposed solution is able to provide concrete performance insights, which would not have been reached with the current performance tools and which motivated beneficial changes in the respective source code in real life. Finally, its overhead is discussed and proven to be suitable for usage with CFD codes. The dissertation provides a valuable addition to the state of the art of highly parallel CFD performance analysis and serves as basis for further suggested research directions

    P-Pascal : a data-oriented persistent programming language

    Get PDF
    Bibliography: pages 187-199.Persistence is measured by the length of time an object is retained and is usable in a system. Persistent languages extend general purpose languages by providing the full range of persistence for data of any type. Moreover, data which remains on disk after program termination, is manipulated in the same way as transient data. As these languages are based on general purpose programming languages, they tend to be program-centred rather than data-centred. This thesis investigates the inclusion of data-oriented features in a persistent programming language. P-Pascal, a Persistent Pascal, has been designed and implemented to develop techniques for data clustering, metadata maintenance, security enforcement and bulk data management. It introduces type completeness to Pascal and in particular shows how a type-complete set constructor can be provided. This type is shown to be a practical and versatile mechanism for handling bulk data collections in a persistent environment. Relational algebra operators are provided and the automatic optimisation of set expressions is performed by the compiler and the runtime system. The P-Pascal Abstract Machine incorporates two complementary data placement strategies, automatic updating of type information, and metadata query facilities. The protection of data types, primary (named) objects and their individual components is supported. The challenges and opportunities presented by the persistent store organisation are discussed, and techniques for efficiently exploiting these properties are proposed. We also describe the effects on a data-oriented system of treating persistent and transient data alike, so that they cannot be distinguished statically. We conclude that object clustering, metadata maintenance and security enforcement can and should be incorporated in persistent programming languages. The provision of a built-in, type-complete bulk data constructor and its non-procedural operators is demonstrated. We argue that this approach is preferable to engineering such objects on top of a language, because of greater ease of use and considerable opportunity for automatic optimisation. The existence of such a type does not preclude programmers from constructing their own bulk objects using other types - this is but one advantage of a persistent language over a database system

    Numerical aerodynamic simulation facility feasibility study

    Get PDF
    There were three major issues examined in the feasibility study. First, the ability of the proposed system architecture to support the anticipated workload was evaluated. Second, the throughput of the computational engine (the flow model processor) was studied using real application programs. Third, the availability reliability, and maintainability of the system were modeled. The evaluations were based on the baseline systems. The results show that the implementation of the Numerical Aerodynamic Simulation Facility, in the form considered, would indeed be a feasible project with an acceptable level of risk. The technology required (both hardware and software) either already exists or, in the case of a few parts, is expected to be announced this year. Facets of the work described include the hardware configuration, software, user language, and fault tolerance

    GIZMO: A New Class of Accurate, Mesh-Free Hydrodynamic Simulation Methods

    Get PDF
    We present two new Lagrangian methods for hydrodynamics, in a systematic comparison with moving-mesh, SPH, and stationary (non-moving) grid methods. The new methods are designed to simultaneously capture advantages of both smoothed-particle hydrodynamics (SPH) and grid-based/adaptive mesh refinement (AMR) schemes. They are based on a kernel discretization of the volume coupled to a high-order matrix gradient estimator and a Riemann solver acting over the volume 'overlap.' We implement and test a parallel, second-order version of the method with self-gravity & cosmological integration, in the code GIZMO: this maintains exact mass, energy and momentum conservation; exhibits superior angular momentum conservation compared to all other methods we study; does not require 'artificial diffusion' terms; and allows the fluid elements to move with the flow so resolution is automatically adaptive. We consider a large suite of test problems, and find that on all problems the new methods appear competitive with moving-mesh schemes, with some advantages (particularly in angular momentum conservation), at the cost of enhanced noise. The new methods have many advantages vs. SPH: proper convergence, good capturing of fluid-mixing instabilities, dramatically reduced 'particle noise' & numerical viscosity, more accurate sub-sonic flow evolution, & sharp shock-capturing. Advantages vs. non-moving meshes include: automatic adaptivity, dramatically reduced advection errors & numerical overmixing, velocity-independent errors, accurate coupling to gravity, good angular momentum conservation and elimination of 'grid alignment' effects. We can, for example, follow hundreds of orbits of gaseous disks, while AMR and SPH methods break down in a few orbits. However, fixed meshes minimize 'grid noise.' These differences are important for a range of astrophysical problems.Comment: 57 pages, 33 figures. MNRAS. A public version of the GIZMO code, user's guide, test problem setups, and movies are available at http://www.tapir.caltech.edu/~phopkins/Site/GIZMO.htm
    • …
    corecore