83,883 research outputs found

    XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    Full text link
    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree TT, in which every node ii has a size cic_i and profit pip_i, and the size limitation CC. The target is to find a subset of subtrees rooted at nodes i1,,iki_1,\cdots, i_k respectively such that ci1++cikCc_{i_1}+\cdots +c_{i_k}\le C, and pi1++pikp_{i_1}+\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

    Object Database Scalability for Scientific Workloads

    Get PDF
    We describe the PetaByte-scale computing challenges posed by the next generation of particle physics experiments, due to start operation in 2005. The computing models adopted by the experiments call for systems capable of handling sustained data acquisition rates of at least 100 MBytes/second into an Object Database, which will have to handle several PetaBytes of accumulated data per year. The systems will be used to schedule CPU intensive reconstruction and analysis tasks on the highly complex physics Object data which need then be served to clients located at universities and laboratories worldwide. We report on measurements with a prototype system that makes use of a 256 CPU HP Exemplar X Class machine running the Objectivity/DB database. Our results show excellent scalability for up to 240 simultaneous database clients, and aggregate I/O rates exceeding 150 Mbytes/second, indicating the viability of the computing models

    ATLAS Data Challenge 1

    Full text link
    In 2002 the ATLAS experiment started a series of Data Challenges (DC) of which the goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to be made. A major feature of the first Data Challenge (DC1) was the preparation and the deployment of the software required for the production of large event samples for the High Level Trigger (HLT) and physics communities, and the production of those samples as a world-wide distributed activity. The first phase of DC1 was run during summer 2002, and involved 39 institutes in 18 countries. More than 10 million physics events and 30 million single particle events were fully simulated. Over a period of about 40 calendar days 71000 CPU-days were used producing 30 Tbytes of data in about 35000 partitions. In the second phase the next processing step was performed with the participation of 56 institutes in 21 countries (~ 4000 processors used in parallel). The basic elements of the ATLAS Monte Carlo production system are described. We also present how the software suite was validated and the participating sites were certified. These productions were already partly performed by using different flavours of Grid middleware at ~ 20 sites.Comment: 10 pages; 3 figures; CHEP03 Conference, San Diego; Reference MOCT00

    A Framework for High-Accuracy Privacy-Preserving Mining

    Full text link
    To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

    HERA-B Framework for Online Calibration and Alignment

    Full text link
    This paper describes the architecture and implementation of the HERA-B framework for online calibration and alignment. At HERA-B the performance of all trigger levels, including the online reconstruction, strongly depends on using the appropriate calibration and alignment constants, which might change during data taking. A system to monitor, recompute and distribute those constants to online processes has been integrated in the data acquisition and trigger systems.Comment: Submitted to NIM A. 4 figures, 15 page

    Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems

    Full text link
    The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282
    corecore