3,854 research outputs found
On the Verge of One Petabyte - the Story Behind the BaBar Database System
The BaBar database has pioneered the use of a commercial ODBMS within the HEP
community. The unique object-oriented architecture of Objectivity/DB has made
it possible to manage over 700 terabytes of production data generated since
May'99, making the BaBar database the world's largest known database. The
ongoing development includes new features, addressing the ever-increasing
luminosity of the detector as well as other changing physics requirements.
Significant efforts are focused on reducing space requirements and operational
costs. The paper discusses our experience with developing a large scale
database system, emphasizing universal aspects which may be applied to any
large scale system, independently of underlying technology used.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 6 pages. PSN MOKT01
Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+
To monitor critical infrastructure, high quality sensors sampled at a high
frequency are increasingly used. However, as they produce huge amounts of data,
only simple aggregates are stored. This removes outliers and fluctuations that
could indicate problems. As a remedy, we present a model-based approach for
managing time series with dimensions that exploits correlation in and among
time series. Specifically, we propose compressing groups of correlated time
series using an extensible set of model types within a user-defined error bound
(possibly zero). We name this new category of model-based compression methods
for time series Multi-Model Group Compression (MMGC). We present the first MMGC
method GOLEMM and extend model types to compress time series groups. We propose
primitives for users to effectively define groups for differently sized data
sets, and based on these, an automated grouping method using only the time
series dimensions. We propose algorithms for executing simple and
multi-dimensional aggregate queries on models. Last, we implement our methods
in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our
evaluation shows that compared to widely used formats, ModelarDB+ provides up
to 13.7 times faster ingestion due to high compression, 113 times better
compression due to the adaptivity of GOLEMM, 630 times faster aggregates by
using models, and close to linear scalability. It is also extensible and
supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl
Data assurance in opaque computations
The chess endgame is increasingly being seen through the lens of, and therefore effectively defined by, a data ‘model’ of itself. It is vital that such models are clearly faithful to the reality they purport to represent. This paper examines that issue and systems engineering responses to it, using the chess endgame as the exemplar scenario. A structured survey has been carried out of the intrinsic challenges and complexity of creating endgame data by reviewing the past pattern of errors during work in progress, surfacing in publications and occurring after the data was generated. Specific measures are proposed to counter observed classes of error-risk, including a preliminary survey of techniques for using state-of-the-art verification tools to generate EGTs that are correct by construction. The approach may be applied generically beyond the game domain
- …