83,883 research outputs found
XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme
Query evaluation in an XML database requires reconstructing XML subtrees
rooted at nodes found by an XML query. Since XML subtree reconstruction can be
expensive, one approach to improve query response time is to use reconstruction
views - materialized XML subtrees of an XML document, whose nodes are
frequently accessed by XML queries. For this approach to be efficient, the
principal requirement is a framework for view selection. In this work, we are
the first to formalize and study the problem of XML reconstruction view
selection. The input is a tree , in which every node has a size
and profit , and the size limitation . The target is to find a subset
of subtrees rooted at nodes respectively such that
, and is maximal.
Furthermore, there is no overlap between any two subtrees selected in the
solution. We prove that this problem is NP-hard and present a fully
polynomial-time approximation scheme (FPTAS) as a solution
Object Database Scalability for Scientific Workloads
We describe the PetaByte-scale computing challenges posed by the next generation of particle physics experiments, due to start operation in 2005. The computing models adopted by the experiments call for systems capable of handling sustained data acquisition rates of at least 100 MBytes/second into an Object Database, which will have to handle several PetaBytes of accumulated data per year. The systems will be used to schedule CPU intensive reconstruction and analysis tasks on the highly complex physics Object data which need then be served to clients located at universities and laboratories worldwide. We report on measurements with a prototype system that makes use of a 256 CPU HP Exemplar X Class machine running the Objectivity/DB database. Our results show excellent scalability for up to 240 simultaneous database clients, and aggregate I/O rates exceeding 150 Mbytes/second, indicating the viability of the computing models
ATLAS Data Challenge 1
In 2002 the ATLAS experiment started a series of Data Challenges (DC) of
which the goals are the validation of the Computing Model, of the complete
software suite, of the data model, and to ensure the correctness of the
technical choices to be made. A major feature of the first Data Challenge (DC1)
was the preparation and the deployment of the software required for the
production of large event samples for the High Level Trigger (HLT) and physics
communities, and the production of those samples as a world-wide distributed
activity. The first phase of DC1 was run during summer 2002, and involved 39
institutes in 18 countries. More than 10 million physics events and 30 million
single particle events were fully simulated. Over a period of about 40 calendar
days 71000 CPU-days were used producing 30 Tbytes of data in about 35000
partitions. In the second phase the next processing step was performed with the
participation of 56 institutes in 21 countries (~ 4000 processors used in
parallel). The basic elements of the ATLAS Monte Carlo production system are
described. We also present how the software suite was validated and the
participating sites were certified. These productions were already partly
performed by using different flavours of Grid middleware at ~ 20 sites.Comment: 10 pages; 3 figures; CHEP03 Conference, San Diego; Reference MOCT00
A Framework for High-Accuracy Privacy-Preserving Mining
To preserve client privacy in the data mining process, a variety of
techniques based on random perturbation of data records have been proposed
recently. In this paper, we present a generalized matrix-theoretic model of
random perturbation, which facilitates a systematic approach to the design of
perturbation mechanisms for privacy-preserving mining. Specifically, we
demonstrate that (a) the prior techniques differ only in their settings for the
model parameters, and (b) through appropriate choice of parameter settings, we
can derive new perturbation techniques that provide highly accurate mining
results even under strict privacy guarantees. We also propose a novel
perturbation mechanism wherein the model parameters are themselves
characterized as random variables, and demonstrate that this feature provides
significant improvements in privacy at a very marginal cost in accuracy.
While our model is valid for random-perturbation-based privacy-preserving
mining in general, we specifically evaluate its utility here with regard to
frequent-itemset mining on a variety of real datasets. The experimental results
indicate that our mechanisms incur substantially lower identity and support
errors as compared to the prior techniques
HERA-B Framework for Online Calibration and Alignment
This paper describes the architecture and implementation of the HERA-B
framework for online calibration and alignment. At HERA-B the performance of
all trigger levels, including the online reconstruction, strongly depends on
using the appropriate calibration and alignment constants, which might change
during data taking. A system to monitor, recompute and distribute those
constants to online processes has been integrated in the data acquisition and
trigger systems.Comment: Submitted to NIM A. 4 figures, 15 page
Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems
The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282
- …