Search CORE

83,883 research outputs found

XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

Author: A. Balmin
A. Chebotko
D. Florescu
D. Kossmann
H. Gupta
H. Gupta
H.V. Jagadish
M. Atay
M.R. Garey
R. Chirkova
S. Abiteboul
S. Chaudhuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree

T

, in which every node

i

has a size

c_i

and profit

p_i

, and the size limitation

C

. The target is to find a subset of subtrees rooted at nodes

i_1,\cdots, i_k

respectively such that

c_{i_1}+\cdots +c_{i_k}\le C

, and

p_{i_1}+\cdots +p_{i_k}

is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

arXiv.org e-Print Archive

Crossref

Object Database Scalability for Scientific Workloads

Author: Bunn Julian J.
Holtman Koen
Newman Harvey B.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

We describe the PetaByte-scale computing challenges posed by the next generation of particle physics experiments, due to start operation in 2005. The computing models adopted by the experiments call for systems capable of handling sustained data acquisition rates of at least 100 MBytes/second into an Object Database, which will have to handle several PetaBytes of accumulated data per year. The systems will be used to schedule CPU intensive reconstruction and analysis tasks on the highly complex physics Object data which need then be served to clients located at universities and laboratories worldwide. We report on measurements with a prototype system that makes use of a 256 CPU HP Exemplar X Class machine running the Objectivity/DB database. Our results show excellent scalability for up to 240 simultaneous database clients, and aggregate I/O rates exceeding 150 Mbytes/second, indicating the viability of the computing models

Caltech Authors

ATLAS Data Challenge 1

Author: Poulard Gilbert
Publication venue
Publication date: 12/06/2003
Field of study

In 2002 the ATLAS experiment started a series of Data Challenges (DC) of which the goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to be made. A major feature of the first Data Challenge (DC1) was the preparation and the deployment of the software required for the production of large event samples for the High Level Trigger (HLT) and physics communities, and the production of those samples as a world-wide distributed activity. The first phase of DC1 was run during summer 2002, and involved 39 institutes in 18 countries. More than 10 million physics events and 30 million single particle events were fully simulated. Over a period of about 40 calendar days 71000 CPU-days were used producing 30 Tbytes of data in about 35000 partitions. In the second phase the next processing step was performed with the participation of 56 institutes in 21 countries (~ 4000 processors used in parallel). The basic elements of the ATLAS Monte Carlo production system are described. We also present how the software suite was validated and the participating sites were certified. These productions were already partly performed by using different flavours of Grid middleware at ~ 20 sites.Comment: 10 pages; 3 figures; CHEP03 Conference, San Diego; Reference MOCT00

arXiv.org e-Print Archive

CERN Document Server

A Framework for High-Accuracy Privacy-Preserving Mining

Author: Agrawal Shipra
Haritsa Jayant R.
Publication venue
Publication date: 01/01/2004
Field of study

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

arXiv.org e-Print Archive

CiteSeerX

Open Access Repository of IISc Research Publications

HERA-B Framework for Online Calibration and Alignment

Author: Amorim A.
Hernandez J. M.
Kreuzer P.
Medinnis M.
Ressing D.
Rybnikov V.
Sanchez F.
Schwanke U.
Publication venue: 'Elsevier BV'
Publication date: 20/09/2004
Field of study

This paper describes the architecture and implementation of the HERA-B framework for online calibration and alignment. At HERA-B the performance of all trigger levels, including the online reconstruction, strongly depends on using the appropriate calibration and alignment constants, which might change during data taking. A system to monitor, recompute and distribute those constants to online processes has been integrated in the data acquisition and trigger systems.Comment: Submitted to NIM A. 4 figures, 15 page

arXiv.org e-Print Archive

DESY Publication Database

DESY

CERN Document Server

Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems

Author: Baruah Sanjoy
Bestavros Azer
Publication venue: Boston University Computer Science Department
Publication date: 22/08/1996
Field of study

The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282

Boston University Institutional Repository (OpenBU)