6,959 research outputs found
Scalability analysis of declustering methods for multidimensional range queries
Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions
A Grouping Genetic Algorithm for Joint Stratification and Sample Allocation Designs
Predicting the cheapest sample size for the optimal stratification in
multivariate survey design is a problem in cases where the population frame is
large. A solution exists that iteratively searches for the minimum sample size
necessary to meet accuracy constraints in partitions of atomic strata created
by the Cartesian product of auxiliary variables into larger strata. The optimal
stratification can be found by testing all possible partitions. However the
number of possible partitions grows exponentially with the number of initial
strata. There are alternative ways of modelling this problem, one of the most
natural is using Genetic Algorithms (GA). These evolutionary algorithms use
recombination, mutation and selection to search for optimal solutions. They
often converge on optimal or near-optimal solution more quickly than exact
methods. We propose a new GA approach to this problem using grouping genetic
operators instead of traditional operators. The results show a significant
improvement in solution quality for similar computational effort, corresponding
to large monetary savings.Comment: 22 page
Cross-Layer Optimization of Fast Video Delivery in Cache-Enabled Relaying Networks
This paper investigates the cross-layer optimization of fast video delivery
and caching for minimization of the overall video delivery time in a two-hop
relaying network. The half-duplex relay nodes are equipped with both a cache
and a buffer which facilitate joint scheduling of fetching and delivery to
exploit the channel diversity for improving the overall delivery performance.
The fast delivery control is formulated as a two-stage functional non-convex
optimization problem. By exploiting the underlying convex and quasi-convex
structures, the problem can be solved exactly and efficiently by the developed
algorithm. Simulation results show that significant caching and buffering gains
can be achieved with the proposed framework, which translates into a reduction
of the overall video delivery time. Besides, a trade-off between caching and
buffering gains is unveiled.Comment: 7 pages, 4 figures; accepted for presentation at IEEE Globecom, San
Diego, CA, Dec. 201
A Virtual Data Grid for LIGO
GriPhyN (Grid Physics Network) is a large US collaboration to
build grid services for large physics experiments, one of which is LIGO, a
gravitational-wave observatory. This paper explains the physics and computing
challenges of LIGO, and the tools that GriPhyN will build to address
them. A key component needed to implement the data pipeline is a virtual
data service; a system to dynamically create data products requested during
the various stages. The data could possibly be already processed in a certain
way, it may be in a file on a storage system, it may be cached, or it may need
to be created through computation. The full elaboration of this system will al-low
complex data pipelines to be set up as virtual data objects, with existing
data being transformed in diverse ways
Foam: A General-Purpose Cellular Monte Carlo Event Generator
A general purpose, self-adapting, Monte Carlo (MC) event generator
(simulator) is described. The high efficiency of the MC, that is small maximum
weight or variance of the MC weight is achieved by means of dividing the
integration domain into small cells. The cells can be -dimensional
simplices, hyperrectangles or Cartesian product of them. The grid of cells,
called ``foam'', is produced in the process of the binary split of the cells.
The choice of the next cell to be divided and the position/direction of the
division hyper-plane is driven by the algorithm which optimizes the ratio of
the maximum weight to the average weight or (optionally) the total variance.
The algorithm is able to deal, in principle, with an arbitrary pattern of the
singularities in the distribution. As any MC generator, it can also be used for
the MC integration. With the typical personal computer CPU, the program is able
to perform adaptive integration/simulation at relatively small number of
dimensions (). With the continuing progress in the CPU power, this
limit will get inevitably shifted to ever higher dimensions. {\tt Foam} is
aimed (and already tested) as a component in the MC event generators for the
high energy physics experiments. A few simple examples of the related
applications are presented. {\tt Foam} is written in fully object-oriented
style, in the C++ language. Two other versions with a slightly limited
functionality, are available in the Fortran77 language. The source codes are
available from http://jadach.home.cern.ch/jadach
Sets and indices in linear programming modelling and their integration with relational data models
LP models are usually constructed using index sets and data tables which are closely related to the attributes and relations of relational database (RDB) systems. We extend the syntax of MPL, an existing LP modelling language, in order to connect it to a given RDB system. This approach reuses existing modelling and database software, provides a rich modelling environment and achieves model and data independence. This integrated software enables Mathematical Programming to be widely used as a decision support tool by unlocking the data residing in corporate databases
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
- …