6,959 research outputs found

    Scalability analysis of declustering methods for multidimensional range queries

    Get PDF
    Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

    A Grouping Genetic Algorithm for Joint Stratification and Sample Allocation Designs

    Full text link
    Predicting the cheapest sample size for the optimal stratification in multivariate survey design is a problem in cases where the population frame is large. A solution exists that iteratively searches for the minimum sample size necessary to meet accuracy constraints in partitions of atomic strata created by the Cartesian product of auxiliary variables into larger strata. The optimal stratification can be found by testing all possible partitions. However the number of possible partitions grows exponentially with the number of initial strata. There are alternative ways of modelling this problem, one of the most natural is using Genetic Algorithms (GA). These evolutionary algorithms use recombination, mutation and selection to search for optimal solutions. They often converge on optimal or near-optimal solution more quickly than exact methods. We propose a new GA approach to this problem using grouping genetic operators instead of traditional operators. The results show a significant improvement in solution quality for similar computational effort, corresponding to large monetary savings.Comment: 22 page

    Cross-Layer Optimization of Fast Video Delivery in Cache-Enabled Relaying Networks

    Full text link
    This paper investigates the cross-layer optimization of fast video delivery and caching for minimization of the overall video delivery time in a two-hop relaying network. The half-duplex relay nodes are equipped with both a cache and a buffer which facilitate joint scheduling of fetching and delivery to exploit the channel diversity for improving the overall delivery performance. The fast delivery control is formulated as a two-stage functional non-convex optimization problem. By exploiting the underlying convex and quasi-convex structures, the problem can be solved exactly and efficiently by the developed algorithm. Simulation results show that significant caching and buffering gains can be achieved with the proposed framework, which translates into a reduction of the overall video delivery time. Besides, a trade-off between caching and buffering gains is unveiled.Comment: 7 pages, 4 figures; accepted for presentation at IEEE Globecom, San Diego, CA, Dec. 201

    A Virtual Data Grid for LIGO

    Get PDF
    GriPhyN (Grid Physics Network) is a large US collaboration to build grid services for large physics experiments, one of which is LIGO, a gravitational-wave observatory. This paper explains the physics and computing challenges of LIGO, and the tools that GriPhyN will build to address them. A key component needed to implement the data pipeline is a virtual data service; a system to dynamically create data products requested during the various stages. The data could possibly be already processed in a certain way, it may be in a file on a storage system, it may be cached, or it may need to be created through computation. The full elaboration of this system will al-low complex data pipelines to be set up as virtual data objects, with existing data being transformed in diverse ways

    Foam: A General-Purpose Cellular Monte Carlo Event Generator

    Get PDF
    A general purpose, self-adapting, Monte Carlo (MC) event generator (simulator) is described. The high efficiency of the MC, that is small maximum weight or variance of the MC weight is achieved by means of dividing the integration domain into small cells. The cells can be nn-dimensional simplices, hyperrectangles or Cartesian product of them. The grid of cells, called ``foam'', is produced in the process of the binary split of the cells. The choice of the next cell to be divided and the position/direction of the division hyper-plane is driven by the algorithm which optimizes the ratio of the maximum weight to the average weight or (optionally) the total variance. The algorithm is able to deal, in principle, with an arbitrary pattern of the singularities in the distribution. As any MC generator, it can also be used for the MC integration. With the typical personal computer CPU, the program is able to perform adaptive integration/simulation at relatively small number of dimensions (≤16\leq 16). With the continuing progress in the CPU power, this limit will get inevitably shifted to ever higher dimensions. {\tt Foam} is aimed (and already tested) as a component in the MC event generators for the high energy physics experiments. A few simple examples of the related applications are presented. {\tt Foam} is written in fully object-oriented style, in the C++ language. Two other versions with a slightly limited functionality, are available in the Fortran77 language. The source codes are available from http://jadach.home.cern.ch/jadach

    Sets and indices in linear programming modelling and their integration with relational data models

    Get PDF
    LP models are usually constructed using index sets and data tables which are closely related to the attributes and relations of relational database (RDB) systems. We extend the syntax of MPL, an existing LP modelling language, in order to connect it to a given RDB system. This approach reuses existing modelling and database software, provides a rich modelling environment and achieves model and data independence. This integrated software enables Mathematical Programming to be widely used as a decision support tool by unlocking the data residing in corporate databases

    2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

    Full text link
    We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2182^{18}) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (409634096^3) particle cosmological simulations, accounting for 4Ă—10204 \times 10^{20} floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC '1
    • …
    corecore